Developer Release Note: Text Encoding Converter Manager 1.4.2

(February 9, 1999 - P. Edberg)

Version 1.4.2 of the Text Encoding Converter Manager (TEC) is included with Veronica (Mac OS 8.6) and with Macintosh Runtime for Java (MRJ), version 2.1. This note describes external changes from TEC 1.4 (included with Mac OS 8.5) to TEC 1.4.2.

Note: TEC version 1.4.1 was included with some preliminary releases of MRJ; alpha versions of TEC 1.4.1 were also included with early Veronica seeds. However, TEC 1.4.1 was not shipped as part of any final release, and it has been superseded by TEC 1.4.2.


1. Interface file changes

The TEC 1.4.2 release includes an updated version of TextCommon.h. This is intended to replace the older version in Universal Interfaces 3.2 for developers who are using TEC 1.4.2. The TEC release also includes copies of UnicodeConverter.h, TextEncodingConverter.h, and TextEncodingPlugin.h, but these have not changed since TEC 1.4. (TextEncodingPlugin.h was not included with Universal Interfaces 3.2, but UnicodeConverter.h and TextEncodingConverter.h are the same as in Universal Interfaces 3.2).

The TextEncodingVariant constants used for variants of the MacRoman, MacIcelandic, MacCroatian, MacRomanian, and MacVT100 encodings have been redefined to better handle the way that these encodings were changed in Mac OS 8.5 (CURRENCY SIGN was replaced with EURO SIGN).

  1. MacRoman: For Mac OS Roman, TEC 1.4 introduced kMacRomanCurrencySignVariant (1); kMacRomanStandardVariant (0) was redefined to mean the EURO SIGN variant. TEC 1.4.2 introduces kMacRomanEuroSignVariant (2), and replaces kMacRomanStandardVariant with kMacRomanDefaultVariant (also 0); the latter is a "meta-value" and will resolve to either kMacRomanCurrencySignVariant or kMacRomanEuroSignVariant depending on the system version.
  2. MacIcelandic: For Mac OS Icelandic, TEC 1.4.2 introduces the following: It also replaces kMacIcelandicStandardVariant (0) with kMacIcelandicStdDefaultVariant (also 0), a meta-value that resolves to either kMacIcelandicStdCurrSignVariant or kMacIcelandicStdEuroSignVariant depending on system version. Similarly, it replaces kMacIcelandicTrueTypeVariant (1) with kMacIcelandicTTDefaultVariant (also 1), a meta-value that resolves to kMacIcelandicTTCurrSignVariant or kMacIcelandicTTEuroSignVariant depending on system version
  3. MacCroatian, MacRomanian, and MacVT100: For each of these, TEC 1.4.2 introduces constants for a CURRENCY SIGN variant (1), a EURO SIGN variant (2), and a default variant meta-value (0).

The result of the above changes is that we now have the following meta-values for variants of Mac OS encodings, which resolve to real variants as shown in the figure below:

In addition, there is a change to allow applications to detect when the files in the Text Encodings folder may not match the version of the TEC code:

  1. At the end of the TECInfo struct returned by TECGetInfo, add two fields, and increment the format code kTECInfoCurrentFormat from 1 to 2. The fields are:

2. Implementation changes & fixes, TextCommon & UnicodeConverter libraries

  1. Support the new meta-variants described in section 1. ResolveDefaultTextEncoding maps TextEncodings with meta-variants to TextEncodings with real variants. UpgradeScriptInfoToTextEncoding always returns TextEncodings with real variants.
  2. Add mapping tables to support the EuroSign variants of MacIcelandic, MacCroatian, MacRomanian, and MacVT100 encodings.
  3. Fix some problems with the way that CreateUnicodeToTextRunInfo, CreateUnicodeToTextRunInfoByEncoding, and CreateUnicodeToTextRunInfoByScriptCode generate their lists of all available encodings (i.e. when their first parameter is 0 or has its high bit set). Now these lists specify the appropriate set of TextEncodings with real variants. This solves a problem in TEC 1.4 that existed when these functions were asked to create a UnicodeToTextRunInfo with all available encodings and with a preference for a non-zero variant of MacRoman.
  4. When the TextCommon library is initialized, find the highest and lowest version (from the 'vers' resource) of the files in the TextEncodings folder. Put this information into the new fields of TECInfo.
  5. ChangeTextToUnicodeInfo wasn't updating one field of the TextToUnicodeInfo. This could lead to incorrect exclusion or inclusion of mapping subtables. When changing from MacRoman to MacJapanese, for example, it led to exclusion of the MacJapanese characters 0x8150-0x8163.
  6. In ConvertFromUnicodeToText[Run], don't scan ahead unnecessarily. The scanner that determines Unicode text elements was not exiting immediately when it encountered characters such as controls, which cannot be part of a longer text element.
  7. Change the way TEC manages its global data; in particular, eliminate the TextCommonGlobals fragment. Also, make sure that per-context globals are correctly initialized.
  8. Eliminate a dangling pointer problem that could occur when loading tables from a Text Encodings file whose name had non-ASCII characters.
  9. Solve some fragment loading and override problems encountered when TEC 1.4.2 was used with Mac OS 8.5 or 8.5.1 systems booted from HFS Extended volumes.

3. Implementation fixes, TextEncodingConverter library and plugins

  1. TECCreateConverter returned kTECNoConversionPathErr when trying to create a converter whose output encoding had a non-zero variant. It is now fixed to handle non-zero variants correctly.
  2. TECCreateConverterFromPath returned kTECNoConversionPathErr if any TextEncoding in the supplied path had an element (base, variant, or format) which was a meta-value, such as kTextEncodingUnicodeDefault or kMacRomanDefaultVariant. Fixed.
  3. The BigFive/MacChineseTrad sniffer was treating every byte in the range 0x40-0x7E or 0xA1-0xFE as a feature, and every byte in the range 0x80-0xA0 or 0xFF as an error. This counted every two-byte character as two features, and also counted some valid MacChineseTrad single-byte characters (e.g. 0x81, 0xA0, 0xFF) as errors. Fixed.